Tag

#AI safety

92 articles

From the Vatican stage, Anthropic’s Chris Olah says AI cannot be steered by AI labs alone

Learn to build an AI interpretability tool that analyzes how language models make decisions by examining attention patterns and gradients, following principles discussed by Anthropic's Chris Olah.

May 2510

Trump pulls AI safety order after last-minute calls from Musk, Zuckerberg, and Sacks

Learn how to build a simple AI safety review system that evaluates AI models based on safety criteria, simulating the kind of voluntary review process that was proposed in a recently withdrawn executive order.

May 224

Musk v. Altman: Much ado about nothing

This article explains AI governance concepts through the lens of the Musk-Altman OpenAI legal battle, exploring how competing visions for AI development create governance challenges.

May 216

Former OpenAI Staffers Warn That xAI’s Poor Safety Record Could Complicate SpaceX’s IPO

Former OpenAI employees are warning that xAI's safety record could pose risks for SpaceX's upcoming IPO, urging investors to demand greater transparency about AI safety practices.

May 1920

Prominent AI researcher Andrej Karpathy picks Anthropic over former home OpenAI to get back into frontier LLM research

Prominent AI researcher Andrej Karpathy has joined Anthropic, leaving OpenAI behind. His critique of reinforcement learning from human feedback (RLHF) and focus on AI safety align with Anthropic's mission.

May 1912

MAGA-aligned groups want government oversight of frontier AI models

This article explains what frontier AI is, why safety testing matters, and how government oversight can help protect people from potential AI risks.

May 189

Why trust is a big question at the Elon Musk-OpenAI trial

This article explores the complex concept of trustworthiness in AI leadership, examining how it impacts governance, risk management, and the future of artificial intelligence development.

May 178

Helping ChatGPT better recognize context in sensitive conversations

OpenAI enhances ChatGPT's safety protocols to better recognize context in sensitive conversations, improving risk detection over time. The updates represent a significant step forward in responsible AI development.

May 1613

The most-cited computer scientist alive says AI could make humanity extinct within a decade

Yoshua Bengio, a Turing Award-winning computer scientist and AI pioneer, warns that hyperintelligent AI could pose an existential threat to humanity within a decade. His latest concerns come amid rapid advancements in AI technology and a growing call for global safety standards.

May 1630

Anthropic's Mythos is evolving faster than expected, reports AI safety agency

Anthropic's Mythos model is advancing rapidly, surpassing safety benchmarks just weeks after its release. The AI safety agency's report highlights the model's impressive performance in alignment testing.

May 1420

Who trusts Sam Altman?

This explainer examines the complex concept of trust in AI systems, exploring how technical mechanisms and human-AI interactions determine reliable AI behavior. Learn about the mathematical foundations and practical implications of AI trust in critical applications.

May 1311

We’re feeling cynical about xAI’s big deal with Anthropic

xAI's partnership with Anthropic has sparked debate among industry observers, raising questions about strategic direction and the balance between innovation and AI safety.

May 1122